Conversation
|
Have you run the benchmarks for this? |
sorry, new here, could you tell me how do I run them? 😅 |
I think "cargo bench -p arrow-cast" should be sufficient. |
|
run benchmark cast_kernels |
|
Thank you @Rafferty97 and @aryan-212 -- I kicked off some benchmark runs to verify the performance implications of this change |
alamb
left a comment
There was a problem hiding this comment.
Thank you for this PR @aryan-212 and (especially) for the help reviwing @Rafferty97
| impl Parser for Float16Type { | ||
| fn parse(string: &str) -> Option<f16> { | ||
| lexical_core::parse(string.as_bytes()) | ||
| lexical_core::parse(string.trim().as_bytes()) |
There was a problem hiding this comment.
I wonder if there will be any performance implications 🤔
| #[test] | ||
| fn test_parse_trimmed_whitespace() { | ||
| // Float types | ||
| assert_eq!(Float16Type::parse(" 1.5 "), Some(f16::from_f32(1.5))); |
There was a problem hiding this comment.
Cn you please add some tests with only leading whitespace and some tests with only trailing whitespace?
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Details
Resource Usagebase (merge-base)
branch
|
Which issue does this PR close?
Rationale for this change
The
Parser::parseimplementations for numeric types did not trim whitespace before parsing. This caused values like " 42 " or " 1.5 " to fail parsing and returnNone, even though they represent valid numbers.What changes are included in this PR?
.trim()calls before parsing in FloatType Parser implementations.string.trim()at the top of the parser_primitive! macro, which covers all integer and duration types.Are these changes tested?
Yes. Added test_parse_trimmed_whitespace covering:
Datafusion changes
For the following SQL :-
in datafusion we used to get
now after these changes, we get
this behaviour is now aligned with Databricks
Are there any user-facing changes?
Yes. Numeric parsing now accepts strings with leading/trailing whitespace. This is a relaxation of the previous behaviour (previously
None, nowSome(value)), so it is not a breaking change.